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Abstract 

Three decades of research in communication complexity have led to the invention of a number of 
techniques to lower bound randomized communication complexity. The majority of these techniques 
involve properties of large submatrices (rectangles) of the truth-table matrix defining a communica- 
tion problem. The only technique that does not quite fit is information complexity, which has been 
investigated over the last decade. Here, we connect information complexity to one of the most powerful 
"rectangular" techniques: the recently-introduced smooth corruption (or "smooth rectangle") bound. We 
show that the former subsumes the latter under rectangular input distributions. We conjecture that this 
subsumption holds more generally, under arbitrary distributions, which would resolve the long-standing 
direct sum question for randomized communication. 

As an application, we obtain an optimal £l(n) lower bound on the information complexity — under 
the uniform distribution — of the so-called orthogonality problem (ORT), which is in turn closely related 
to the much-studied Gap-Hamming-Distance (GHD). The proof of this bound is along the lines of recent 
communication lower bounds for GHD, but we encounter a surprising amount of additional technical 
detail. 

1 Introduction 

The basic, and most widely-studied, notion of communication complexity deals with problems in which two 
players — Alice and Bob — engage in a communication protocol designed to "solve a problem" whose input 
is split between them. We shall focus exclusively on this model here, and we shall be primarily concerned 
with the problem of computing a Boolean function / : 5Tx^ — > {— 1,1}. As is often the case, we are most 
interested in lower bounds. 



1.1 Lower Bound Techniques and the Odd Man Out 

The preeminent textbook in the field remains that of Kushilevitz and Nisan [KN97], which covers the basics 
as well as several advanced topics and applications. Scanning that textbook, one finds a number of lower 
bounding techniques, i.e., techniques for proving lower bounds on D(/) and R(/), the deterministic and 
randomized (respectively) communication complexities of /. Some of the more important techniques are 
the fooling set technique, log rank, discrepancy and corruption^ Research postdating the publication of the 

*Work supported in part by NSF Grant IIS-0916565. 

'Though the corruption technique is discussed in Kushilevitz and Nisan, the term "corruption" is due to Beame et al. [BPSW06|. 
The technique has also been called "one-sided discrepancy" and "rectangle method" |Kla03] by other authors. 



book has produced a number of other such techniques, including the factorization norms method [LS09 1, the 
pattern matrix method [She08 |, the partition bound and the smooth corruption!! bound IIJK10II . Notably, all of 
these techniques ultimately boil down to a fundamental fact called the rectangle property. One way of stating 
it is that each fiber of a deterministic protocol, defined as a maximal set of inputs (x,y) G 5Txf that result 
in the same communication transcript, is a combinatorial rectangle in SC x W . The aforementioned lower 
bound techniques ultimately invoke the rectangle property on a protocol that computes /; for randomized 
lower bounds, (the easy direction of) Yao's minimax lemma also comes into play. 

One recent technique is an odd man out: namely, information complexity, which was formally introduced 
by Chakrabarti et al. HCSWY011 . generalized in subsequent work HBJKS041 |JKS03l IBBCRlOl though its 
ideas appear in the earlier work of Ablayev [Abl96 | (see also Saks and Sun [SS02]). Here, one defines 
an information cost measure for a protocol that captures the "amount of information revealed" during its 
execution, and then considers the resulting complexity measure IC(/), for a function /. A precise definition 
of the cost measure admits a few variants, but all of them quite naturally lower bound the corresponding 
communication cost. The power of this technique comes from a natural direct sum property of information 
cost, which allows one to easily lower bound IC(/) for certain well-structured functions /. Specifically, 
when / is a "combination" of n copies of a simpler function g, one can often scale up a lower bound on 
lC(g) to obtain IC(/) > £2(n lC(g)). The burden then shifts to lower bounding lC(g), and at this stage the 
rectangle property is invoked, but on protocols for g, not f. 

A nice consequence of lower bounding R(/) via a lower bound on IC(/) is that one then obtains a direct 
sum theorem for free: that is, we obtain the bound R(/") > H(n IC(/)) as an almost immediate corollary. 
We shall be more precise about this in Section |2] 

1.2 First Contribution: Rectangular versus Informational Methods 

It is natural to ask how, quantitatively, these numerous lower bounding techniques relate to one another. 
One expects the various "rectangular" techniques to relate to one another, and indeed several such results 
are known [Kla03l ILS091 IJKlOll . Here, we relate the "informational" technique to one of the most pow- 
erful rectangular techniques, with respect to randomized communication complexity. To motivate our first 
theorem, we begin with a sweeping conjecture. 

Conjecture 1.1. The best information complexity lower bound on R(/) is, asymptotically, at least as good 
as the smooth corruption (a.k.a., smooth rectangle) bound, and hence, at least as good as the corruption, 
smooth discrepancy and discrepancy bounds. 

We point out that a very recent manuscript of Kerenidis et al. llKLL + 12t claims to have settled this 
conjecture (for a natural setting of parameters). Since this work was done independent of theirs, and due to 
the short interval between this writing and theirs, we shall continue to label the statement as (our) conjecture. 

In conjunction with the results of Jain and Klauck UK 1011 . the above conjecture states that information 
complexity subsumes just about every other lower bound technique for R(/). All of these lower bound tech- 
niques involve a choice of an input distribution. What we are able to prove is a special case of the conjecture: 
the case when the input distributions involved are rectangularl! The statement below is somewhat informal 
and neither fully detailed nor fully general: a precise version appears as Theorem 13. II 

Theorem 1.2. Let p be a rectangular input distribution for a communication problem f : { — 1,1}" x 
{ — 1,1}" — > { — 1,1}. Then, with respect to p,for small enough errors £, the information complexity bound 
ICg (/) is asymptotically as good as the smooth corruption bound scb^ (/) with error parameter 400£ 
and perturbation parameter £. That is, we have ICg (/) = Q(scb£)Q e „(/)). 

2 Jain and Klauck | JK10] used the term "smooth rectangle bound", but we shall prefer the more descriptive term "corruption" to 
"rectangle" throughout this article. 

3 Some authors use the term "product distribution" for what we call rectangular distributions. 
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Precise definitions of the terms in the above theorem are given in Section [2] We note that a recent 
manuscript HBW111 lower bounds information complexity by discrepancy, a result that is similar in spirit to 
ours. This result is incomparable with ours, because on the one hand discrepancy is a weaker technique than 
corruption, but on the other hand there is no restriction on the input distribution. 

We remark that our proof of Theorem ll.2l uses only elementary combinatorial and information theo- 
retic arguments, and proceeds along intuitive lines. Accordingly, we believe that it remains of independent 
interest, despite the very recent claim to a stronger result by Kerenidis et al. tKLL + 12t . 

1.3 Second Contribution: Information Complexity of Orthogonality and Gap-Hamming 

The APPROXIMATE-ORTHOGONALITY problem is a communication problem defined on inputs in { — 1 , 1 }" x 
{—1,1}" by the Boolean function 



Here, b is to be thought of as a constant parameter. This problem arose naturally in Sherstov's work on the 
Gap-Hamming Distance problem MShellall . This latter problem is defined as follows: 



The Gap-Hamming problem has attracted plenty of attention over the last decade, starting from its formal 
introduction in Indyk and Woodruff [IW03 1 in the context of data stream lower bounds, leading up to a 
recent flurry of activity that has produced three different proofs HCR1U IVidlll IShellal of an optimal lower 
bound R(ghd„) = £2(n). In some recent work, Woodruff and Zhang [WZ11| identify a need for strong 
lower bounds on IC(ghd), to be used in direct sum results. We now attempt to address such a lower bound. 

At first sight, these problems appear to be ideally suited for a lower bound via information complexity: 
they are quite naturally combinations of n independent communication problems, each of which gives Alice 
and Bob a single input bit each. One feels that the uniform input distribution ought to be hard for them for 
the intuitive reason that a successful protocol cannot afford to ignore (o(y/n) of the coordinates of x and y, 
and must therefore convey information per coordinate for at least Cl(n) coordinates. However, turning 
this intuition into a formal proof is anything but simple. 

Here, we prove an optimal £l(n) lower bound on IC(ort) under the uniform input distribution. This is a 
consequence of Theorem 1 1 . 2 1 above . but there turns out to be a surprising amount of work in lower bounding 
scb(ORT) under the uniform distribution. Our theorem involves the tail of the standard normal distribution, 
which we denote by "tail": 



We also reserve /I for the uniform distribution on { — 1, 1}" x { — 1, 1}". 

Theorem 1.3. Let b be a sufficiently large constant. Then, the corruption bound cbg^(ORT^„) = Q.(n),for 
6 = tail(2.01£>). Hence, by Theorem\L2\ we have IC^ 400 (ORT fo ,„) = £l(n). 

Again, precise definitions of the terms in the above theorem are given in Section [2] and the proof of 
the theorem appears in Section H] As it turns out, a slight strengthening of the parameter 6 in the above 
theorem would give us the result ICg, (ghd„) = O(n). This is because the following result — stated somewhat 
imprecisely for now — connects the two problems. 
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Theorem 1.4. Let bbea sufficiently large constant and let 6 = tail( 1 .99b). Then, we have scb^g e (GHD„) = 
n(cb^, e (ORT£ jn )) - 0(y/n). By Theorem\L2\ we then have ICg(GHD„) = ^(cb^Qg (ORT^„)) - 0{y/n). 

We note that Chakrabarti and Regev [CR1 1 ] state that their lower bound technique for R(ghd„) can be 
captured within the smooth rectangle bound framework. While this is true in spirit, there is a significant 
devil in the details, and their technique does not yield a good lower bound on scb^ 5 (ghd„) for the uniform 
distribution /I. We explain more in Section |4] 

These theorems suggest a natural follow-up conjecture that we leave open. 

Conjecture 1.5. There exists a constant e such that ICg (ghd„) = Q.{n). 
1.4 Direct Sum 

A direct sum theorem states that solving m independent instances of a problem requires about m times the 
resources that solving a single instance does. It could apply to a number of models of computation, with 
"resources" interpreted appropriately. For our model of two-party communication, it works as follows. For 
a function / : x & {-1, 1}, let f m : 3£ m x& m ->{-l, l} m denote the function given by 

f n (xi,...,x m ,yi,...,y m ) = {f{xi,yi),...,f{x m ,y m )). 

Notice that f m is not a Boolean function. We will define R(/ m ) to be the randomized communication 
complexity of the task of outputting a vector (zi, . . . ,z m ) such that for each i € [m], we have f(xi,yt) = Zi 
with high probability. Then, a direct sum theorem for randomized communication complexity would say 
that R(/ w ) = Q.(m • R(/)). Whether or not such a theorem holds for a general / is a major open question in 
the field. 

Information complexity, by its very design, provides a natural approach towards proving a direct sum 
theorem. Indeed, this was the original motivation of Chakrabarti et al. ICS WYOlj in introducing infor- 
mation complexity; they proved a direct sum theorem for randomized simultaneous-message and one-way 
complexity, for functions / satisfying a certain "robustness" condition. Still using information complexity, 
Jain et al. [ JRS03 ] proved a direct sum theorem for bounded-round randomized complexity, when / is hard 
under a product distribution. Recently, Barak et al. [BB CRlOl used information complexity, together with 
a protocol compression approach, to mount the strongest attack yet on the direct sum question for R(/), for 
fairly general /: they show that R(/ m ) ~ Cl(*/m ■&(/)), where the ignores logarithmic factors. 

One consequence of our work here is a simple proof of a direct sum theorem for randomized commu- 
nication complexity for functions whose hardness is captured by a smooth corruption bound (which in turn 
subsumes corruption, discrepancy and smooth discrepancy HJK10I0 under a rectangular distribution. This 
includes the well-studied INNER-PRODUCT function, and thanks to our Theorem II .31 it also includes ORT. 
Should Conjecture ll.ll be shown to hold, we could remove the rectangularity constraint altogether and cap- 
ture additional important functions such as DISJOINTNESS, whose hardness seems to be captured only by 
considering corruption under a non-rectangular distribution. 

We note that the protocol compression approach [BBCR10] gives a strong direct sum result for distribu- 
tional complexity under rectangular distributions, but still not as strong as ours because their result contains 
a not-quite-benign poly logarithmic factor. We say more about this in Section [4] 

Comparison with Direct Product. Other authors have considered a related, yet different, concept of 
direct product theorems. A strong direct product theorem (henceforth, SDPT) says that computing f m with 
a correctness probability as small as 2~ a ("^ — but more than the trivial guessing bound — requires Q(m R(/)) 
communication, where "correctness" means getting all m coordinates of the output right. It is known that 
SDPTs do not hold in all situations HSha031 . but do hold for (generalized) discrepancy [LSv08l lShellb1 . an 
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especially important technique in lower bounding quantum communication. A recent manuscript offers an 
SDPT for bounded-round randomized communication [JP Y121 . 

Although strong direct product theorems appear stronger than direct sum theorems0 they are in fact in- 
comparable. A protocol could conceivably achieve low error on each coordinate of f m {x\ , . . . ,x m , y\ , . . . , y m ) 
while also having zero probability of getting the entire m-tuple right. 

2 Preliminaries 

Consider a function /i^xf — > where 3£ , W , are nonempty finite sets. Although we will develop 
some initial theory under this general setting, it will be useful to keep in mind the important special case 
5£ = & = {—1,1}" and ^ = {—1,1}. We can interpret such a function / as a communication problem 
wherein Alice receives an input x G SC , Bob receives an input y G <3S , and the players must communicate 
according to a protocol P to come up with a value z € if that is hopefully equal to f(x,y). The sequence 
of messages exchanged by the players when executing P on input (x,y) is called the transcript of P on that 
input, and denoted P(x,y). We require that the transcript be a sequence of bits, and end with (a binary 
encoding of) the agreed-upon output. We denote the output corresponding to a transcript t by out(f): thus, 
the output of P on input (x, y) is out(P(^,y)). 

Our protocols will, in general, be randomized protocols with a public coin as well as a private coin 
for each player. When we disallow the public coin, we will explicitly state that the protocol is private- 
coin. Notice that P(x,y) is a random string, even for a fixed input (x,y). For a real quantity e > 0, we say 
that P computes / with e error if Pr[out(P(x, y)) / f(x,y)] < £, the probability being with respect to the 
randomness used by P and the input distribution. We define the cost of P to be the worst case length of its 
transcript, max |P(jc,y)|, where we maximize over all inputs (x,y) and over all possible outcomes of the coin 
tosses in P. Finally, the £-error randomized communication complexity of / is defined by 

Re(/) = min{cost(/ 5 ) : P computes / with error e} . 

In case & = {-1,1}, we also put R(/) = Ri/ 3 (/). 

For random variables A,B,C, we use notations of the form H(A), H(A | C), H(Afi), I (A : B), and 
I (A : B | C) to denote entropy, conditional entropy, joint entropy, mutual information, and conditional mutual 
information respectively. For discrete probability distributions A,/i, we use Dkl(A || £i) to denote the rela- 
tive entropy (a.k.a., informational divergence or Kullback-Leibler divergence) from A to pt using logarithms 
to the base 2. These standard information theoretic concepts are well described in a number of textbooks, 
e.g., Cover and Thomas [CT06]. 

Let A be an input distribution for /, i.e., a probability distribution on 3£ x <3f . We say that A is a 
rectangular distribution if we can write it as a tensor product A = X\ ® X%, where Ai, X2 are distributions on 
3£ y & respectively. Now consider a general A and let (X,Y) ~ A be a random input for / drawn from this 
joint distribution. We define the A -information-cost of the protocol P to be icost' 1 (P) = l(XY : P(X,Y) \ R), 
where R denotes the public randomness used by P. This cost measure gives us a different complexity 
measure called the £-error information complexity of /, under A: 

ICg (/) = min{icost ;i (P) : P computes / with error e} . 

We note that in the terminology of Barak et al. HBBCR101 , the above quantity would be called the external 
information complexity, as opposed to the internal one, which is based on the cost function I(X : P(X,Y),R \ 

4 Some authors interpret "direct sum" as requiring correctness of the entire m-tuple output with high probability. Under this 
interpretation, direct product theorems indeed subsume direct sum theorems. Our definition of direct sum is arguably more natural, 
because under our definition, we at least have R(/ m ) = 0(m R(/)) always. 
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Y) +I(Y : P(X,Y),R \ X). As noted by them, the two cost measures coincide under a rectangular input 
distribution. Since our work only concerns rectangular distributions, this internal/external distinction is not 
important to us. 

It is easy to see (and by now well-known) that information complexity under any input distribution lower 
bounds randomized communication complexity. 

Fact 2.1. For every input distribution X and error e, we have Rg(/) > ICg (/). 

Proof. Simply observe that 1(XY : P(X,Y) \ R) < R(P(X,Y)) < \P(X,Y)\. □ 
2.1 Corruption and Smooth Corruption 

We consider a communication problem given by a partial function, / :l"xf — > £¥u{*}. We say that the 
function / is undefined on an input (x, y) £ X x <3/ iff f(x,y) = *. For such inputs we say that a protocol 
P computes / correctly on (x,y) always, irrespective of what P outputs. Therefore, we say that a protocol P 
computes / with error e > if Pr[f(x,y) ^ * A out(P(x,y)) ^ f(x,y)] < e where, as before, the probability 
being with respect to the randomness used by P and the input distribution. 

Pick a particular z £ 3f. A set S C x <3f is said to be rectangular if we have S = S\ x 52, where 
Si C ^T,S2 C Following Beame et al. [BPSW06], we say that 5 is £-error z-monochromatic for / under 
X \fX{S\(f- Y {z)Uf-\*))) < eX(S). We then define 

e-mono z ' (/) = max{A(5') : S is rectangular and £-error z-monochromatic} , (1) 
cb~ A (/) = - log(£-mono- a (/)) , (2) 
scbf 5 (/)=max{cbf (g): g e(5uWf xf Pr [f(X,Y)^g(X,Y)}<8}. (3) 

The quantities cbg (/) and scb J s (f) are called the corruption bound and the smooth corruption bound 
respectively, under the indicated choice of parameters. In the latter quantity, we refer to e as the error 
parameter and 8 as the perturbation parameter. One can go on to define bounds independent of z and X by 
appropriately maximizing over these two parameters, but we shall not do that here. 

We note that Jain and Klauck IJK10H use somewhat different notation: what we have called scb above is 
the logarithm of (a slight variant of) the quantity that they call the "natural definition of the smooth rectangle 
bound" and denote srec. 

What justifies calling these quantities "bounds" is that they can be shown to lower bound R e '(/) for 
sufficiently small 8,£,e', under a mild condition on A. It is clear that scb g ' g (/) > cbg (/), so we mention 
only the stronger result, that involves the smooth corruption bound. 

Fact 2.2 (Jain and Klauck (JKTOH ). Let f : 2J x -»• {*}, z € 2 and distribution X on X x<& 
be such that A(/ _1 (z)) > 1/3. Then there is an absolute constant c > such that, for a sufficiently small 
constant e, we have R e (/) > c ■ scb^'g e / 2 (/)- '-' 

The constant 1/3 above is arbitrary and can be parametrized, but we avoid doing this to keep things 
simple. The proof of the above fact is along the expected lines: an application of (the easy direction of) 
Yao's minimax lemma, followed by a straightforward estimation argument applied to the rectangles of the 
resulting deterministic protocol. Note that we never have to involve the linear-programming-based smooth 
rectangle bound as defined by Jain and Klauck. 
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3 Information Complexity versus Corruption 



We are now in a position to tackle our first theorem. 

Theorem 3.1 (Precise restatement of Theorem [L2). Suppose we have a function f ' : 36 ' x ^ — > «2°U {*}, 
a rectangular distribution p on 36 x ', and z £ 3£ satisfying p(/ _1 (z)) > 3/20. Let e,e' be reals with 
0< 384£ < e' < 1/4. Then 

IC ' (/) - 400 scb ^ (/) " 16 = a ( scb ^) - °M ■ 

To prove this, we first consider a notion that we call the distortion of a transcript of a communication 
protocol. Let p be an input distribution for a communication problem, let P be a protocol for the problem, 
and let t be a transcript of P. We define o t = o t (p) to be the distribution (p | P(X,Y) = t). We think of the 
relative entropy Dkl(A || P) as a distortion measure for t: intuitively, if t conveys little information about 
the inputs, then this distortion should be low. The following lemma makes this intuition precise. Notice that 
it does not assume that p is rectangular. 

For the remainder of this section, to keep the notation simple while handling partial functions, we write 
g(x,y) ^zto actually denote the event g(x,y) ^zAg(x,y) ^ * for z £ 36, unless specified otherwise. 

Lemma 3.2. Let P be a private-coin protocol that computes g : 36 x & — >f U{*} with error 8 < 1/500. 
Let J° and let p be an arbitrary distribution on 56 x W with p(g~ l (z)) > 3/20— 1/500. Then, there 
exists a ("low-distortion" ) transcript t ofP such that 

out(f)=z, (4) 
Dkl(^ II p) < 50icost p (P) , and (5) 
Pr[g(X,Y)^z\T = t]<Se, (6) 

where (X,Y) ~p andT =P(X,Y). 

Proof. Let T denote the distribution on transcripts given by P(X,Y). By basic results in information the- 
ory HCT061 . we have 

icost"(P) = \{XY : T) = E T ^ r [D KL (a T \\ p)] . 

Consider a random choice of t according to z. By Markov's inequality, conditions ([5]> and (O fail with 
probability at most 1/50 and 1/8 respectively. By the lower bound on p(g^ i (z,)), condition (O fails with 
probability at most 17 /20 + 1 /500 + e. Since e < 1 /500, and 1/8 + 1/50 + 17/20 + 1 /500 + 1 /500 < 1, it 
follows that there exists a choice of t satisfying all three conditions. □ 

Property [6] in the above lemma should be interpreted as a low-error guarantee for the transcript t. We 
now argue that the existence of such a transcript implies the existence of a "large" low-corruption rectangle, 
provided the input distribution p is rectangular: this is the only point in the proof that uses rectangularity. 
One has to be careful with the interpretation of "large" here: it means large under a t , and not p. However, 
later on we will add in the low-distortion guarantee of Lemma 13 .21 to conclude largeness under p as well. 

Lemma 3.3. Let t be a transcript of a private-coin protocol P for j:Jxf — > 2fu{*}. Let p be a 
rectangular distribution on $6 x <3/ , z, G 2-, (X,Y) ~ p, T = P(X,Y), and e > 0. Suppose 

Pr[g(X,Y)^z\T = t]<e, (7) 

then there exists a rectangle LCjfxf such that 

o,{L)> 9/16, and (8) 

Pr[g(X,Y)^z\ (X,Y) EL] <16e. (9) 



7 



Proof. By the rectangle property for private-coin protocols rBJK S04l Lemma 6.7], there exist mappings 
qi : X -> [0,1], # 2 : [0,1] such thatPr[r = t \X =x,Y = y] = qi(x)q 2 (y). 
Let z denote the distribution of T. We can rewrite the condition © as 

£ q l {x)q 2 {y)p(x,y)<er(t). (10) 

x<=X,y<=&:g(x,y)^z 

Consider the set si of rows whose contribution to the left hand side of (flOl is "low," i.e., 

si=LeX: £ q 2 (y)p(x, y )<4e^q2{y)p(x,y)\- 

^ y-g{x,y)^z y J 

Then, by a Markov-inequality-style argument, we have Pr[X G si | T = f] > |. 

Similarly, consider the following set ^ of columns (notice that we sum over only * G =2^): 

^=(yG^: £ p(x,y)<16£ £p(*,y)). 

We now claim that the rectangle si x 9B has the desired properties. 

From the definition of 98, it follows that for all y G 93, Pr[g(X,y) ^ z\X £ A] < I6e. Therefore, we 
have Pr[g(X,Y) j^z \ (X,Y) € si X SS\ < 16e and hence, the rectangle si x 9B satisfies condition ©. 

Since we know that Pr[X G si \ T = t] > 3/4, to prove that Pr[(X,Y) £six93\T = t]> 9/16 we will 
first show that the columns in 98 have significant "mass" in si using averaging arguments. 

Claim 3.4. We have l ixe . j2/ l iye ^q 2 {y)p(x,y) > \Y, x€s fY, ye 3rq2(y)p{x,y). 

Proof. Assume not. Then Zxe.e/Zye?y\^ c l2{y)p{x,y) > lLxe^Lye^9ziy)p{x,y). Therefore, 

£ E qz(y)p(x,y) > E ?2 60 E p(*>?) 

y£<3f x&gt:g{x,y)+z ys®\9» xestf :g(x,y)^z 

> 16e E </2M£P(^) (bydefof^) 

> 4e J>(y) J>(x,y), 

which contradicts the definition of si . □ 

Recall that p is a rectangular distribution. Suppose r\\ and T72 are its marginals, i.e., p(x,y) = rii(x)ri 2 (y). 
We now observe that the fraction Y. y e3S c l2{y)p{x,y) < l2(y)p{x,y) is the same for all x G X . We have 

I, /^s.vipu-.v) _ Ly6^g2(y)qi(x)T72(y) _ qi(x)L y£ ^g2(y)q2(y) _ LyG^OQ^ly) 
which is indeed independent of x. Denote this fraction by K. With the above observation and claim 13.41 we 
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can conclude that K > 3/4. We can now prove that the rectangle srf X SB satisfies condition ([8]) as follows: 

o t (£/x&)= £ £p(x,y)?i(x)? 2 Cy)/T(0 

= E E ftOO^GOMO 

xe.e/ ye<3f 

= K £ iiW^iW^W^W/tW 
= k E P(*,y)?iW?2(y)/T(0 

3jc 9 

= ffPrfX G ^ I r = fl > — > — . □ 

4 ~ 16 

The proof of our next lemma uses the (classical) Substate Theorem due to Jain, Radhakrishnan and 
Sen [ JRS09]. We state this below in a form that is especially useful for us: it says roughly that if the relative 
entropy Dkl(^-i II ^2) is upper bounded, then the events that have significant probability under X\ continue 
to have significant probability under A2. 

Fact 3.5 (Substate Theorem ||JRS09| ). Let X\ and X2 be distributions on a set X with Dkl(^i || Xj) < d, 
for some positive d. Then, for all SC&.we have X 2 (S) > k\ (S) /2 2+2 / Al ( s )+ 2d /^ ( s ). □ 

Lemma 3.6. Let t be a transcript of a private-coin protocol P for g : x *3f — > £t?L) {*}, and suppose 
out(?) = z G 3f- Let p be a rectangular distribution on 9£ x <3f, and £ < 1. Then at most one of the 
following conditions can hold: 

D KL (a f ||p)<(cb z /0)-7)/4, (11) 
Pv[g(X,Y)^z,\T = t]<£/\6, (12) 

where (X,F) ~p, T = P(X,Y), and o, = (p\T =t). 



Proof. Suppose condition (1121) holds. Then Lemma [331 implies that there exists a rectangle L such that 
o t {L) > 9/16 and Pr[g(X,Y) ^=z \ (X,Y) G L) < e. The latter condition may be rewritten as p(L\(g^ 1 (z) U 
8~ l (*))) ^ £ P(^)» i- e -» L i s £-error z-monochromatic for g under p. 

Suppose (fTTT ) also holds. Then, by the Substate Theorem, for every subset 5 C ^ x ^, we have 

P(sy> ^ 



22+2/ff,(5)+2d/ff,(S) ' 

where d = DKL(<3f || p)- Taking 5 to be the above rectangle L, and noting that o t {L) > 1 /2, we have 

1 1 

P(L) > -=T33 > 



Since L is e-error z-monochromatic, the definition of the corruption bound tells us that cbg P (g) < — log p (L), 
which contradicts the above inequality. □ 

Proof of Theorem\3J\ Suppose, to the contrary, that IC§(/) < scbg', p e (J) /400 - 1/50. Let P* be a proto- 
col for / achieving the £-error information cost under p. By a standard averaging argument, we may fix 
the public randomness of P* to obtain a private-coin protocol P that computes / with error 2e, and has 
icost p (P) < 2icost p (P*). Let g be the function achieving the maximum in Eq. ([3]), the definition of the 
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smooth corruption bound, with error parameter e' and perturbation parameter e. Then scbg, p e (/) = cbgZ'(g) 
and P computes g with error 3£ < 1/500. Furthermore, 

p(g- i (z))>p(r i (z))- i Pr [f(XJ)^g(X,Y)}> 3/20- e> 3/20- 1/500. 

(X,Y)~p 

By Lemma l3T2l there exists a transcript t of P satisfying conditions (0]), ([5]), and ©. The right hand side 
of © is at most 100icost p (P*) < (scb z f £ (f) - 7)/4 = (cb^f (g) - 7) /4 and the right hand side of © is at 
most24e < e'/16. 

Therefore, conditions ([TO and (fT2l in Lemma 1331 are both satisfied, while out(?) = z and p is rectangu- 
lar, which contradicts that lemma. □ 

4 The Information Complexity of Orthogonality and Gap-Hamming 

We now tackle Theorems 11.31 and 11.41 Since these results are closely connected with a few recent works, 
and are both conceptually and technically interesting in their own right, we begin by discussing why they 
take so much additional work. 

For the remainder of this paper, /i„ will denote the uniform distribution on { — 1, 1}" x {— 1, 1}". We will 
almost always drop the subscript n and simply use jl. 

4.1 The Orthogonality Problem 

The first thing to address is why the information complexity of these problems is not already lower bounded 
by an existing general result of Barak et al. [BB CRlOl . 

The Barak-Braverman-Chen-Rao Approach. The protocol compression technique given by Barak et 
al. for rectangular distributions relates information complexity under such distributions to communication 
complexity in what seems like a near-optimal way. Why then are we not happy with their result? To 
understand this, consider a protocol P for ORTjm,, with communication cost c, error e (for some sufficiently 
small constant e) and information cost d, under the uniform distribution pt. Their compression result would 
compress P to a 2£-error ORT protocol P* with 

cost(P*) = o( ^° g e ( 2 c/e) V 

By the distributional complexity lower bound for ORTj/4,, MShellall . we have cost(P*) = Cl(n). However, 
this does not imply d = Q.(n) or even d = Q.(n / po\ylog(n)) ! In particular, we may have the weird situation 
that d = 0(1) and c = 2^ n \ Thus, our lower bound for IC(ORT£ is in fact a strong result, far from what 
follows from prior work. 

A Word About Our Approach. Turning to our proof for a moment, we now see that we need to lower 
bound cb' l (ORTfo „) for a rectangular A. We make the most natural choice, picking X = /x, the uniform input 
distribution. Our proof is then heavily inspired by two recent proofs of an optimal Q.(n) lower bound on 
R(ghd„), namely those of Chakrabarti and Regev HCR111 . and Sherstov BShellaB . At the heart of our proof 
is the following anti-concentration lemma, which says that when pairs (x,y) are randomly drawn from a 
large rectangle in { — 1,1}" x { — 1,1}", the inner product (x,y) cannot be too sharply concentrated around 
zero. 
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Lemma 4.1 (Anti-concentration). Let n be sufficiently large, let b>66bea constant, and let £ = tail(2.01&). 

Then there exists 8 > such that for allA,B C {— 1, 1}" with rnin{|A|, \B\} > 2"~ Sn , we have 



Pr 

(X,Y)e R AxB 



{X,Y} [-b^,b^ 



>£, (13) 



where "£r" denotes "is chosen uniformly at random from". 

The proof of this anti-concentration lemma has several technical steps, and we give this proof in Sec- 
tion[5] Below, we prove Theorem 1 1 . 3 1 using this lemma, and then discuss what is new about this lemma. 

Theorem 4.2 (Precise restatement of Theorem 11.31 ). Let b > 1/5 be a constant. Then cbg' 1 (ORT/, n ) = 
Q.{n),for 6 = tail(2.01max{66,&}). Hence, we have IC^ 400 (ORT^ n ) = Q.(n). 

Proof. We first estimate the corruption bound. Let 8 be the constant whose existence is guaranteed by 
Lemma |4~T1 For b > 66, Eq. ([13]) states precisely that 0-mono 1 ' /i (ORT fo , ;i ) < 2~ Sn . Thus, it follows that 
cbg^ORT^,,) > 8n = Q.{n). For b < 66, we note that 

Pr \(X,Y)£[-by/n,bVn\]> Pr \(X,Y) (£ , 

{X,Y)e R AxB l J {X,Y)E R AxB l J 

for any A,B C {—1,1}". Therefore, using Lemma R~T| as before, we can conclude that cbg^ORT/, „) = £l(n) 
for = tail(2.01 x 66). 

To lower bound the information complexity, we first note that 

scbg'J /400 (ORT foj „) > scbg'J(ORT i)n ) = cbg' M (ORT foj „) = Q.(n) . 

Since b > 1/5, standard estimates of the tail of a binomial distribution give us that jit (ORT^ ( 1)) > 3/20 for 
large enough n. Further, we have 6 = tail(2.01max{66,£>}) < 1/4. Applying Theorem 13 . 1 1 we conclude 
that IC£ /400 (ort 6j „) = Q(n). □ 

We now address why the approaches in two recent works do not suffice to prove Lemma |4~T1 



The Sherstov Approach. At first glance, Lemma |4~T1 may appear to be essentially Sherstov's Theorem 
3.3, but it is not! Sherstov's theorem is a special case of ours that fixes b = 1/4, and the smallness of that 
choice is crucial to Sherstov's proof. In particular, his proof does not work once b > 1. In order to connect 
ORT to GHD, however, we need this anti-concentration with b being a large constant. Looking ahead a bit, 
this is because we need the upper bound in Eq. (fT6l ) to be tight enough. 

The reason that Sherstov's approach requires b to be small is technical, but here is a high-level overview. 
He relies on an inequality of Talagrand (which appears as [Shel la, Fact 2.2]) which states that the projection 
of a random vector from { — 1,1}" onto a linear subspace V C M." is sharply concentrated around VdimV, 
which is at most yjn. Once b > 1 , this sharp concentration works against his approach and, in particular, 
fails to imply anti-concentration of (X,Y) in [— by/n,b-\/n\, which is now too large an interval. 

The Chakrabarti-Regev Approach. At second glance, Lemma 14.11 may appear to be a variant of the 
"correlation inequality" (Theorem 3.5 and Corollary 3.8) of Chakrabarti and Regev. This is true to an 
extent, but crucially our lemma is not a corollary of that correlation inequality, which we state below. 

Fact 4.3 (Equivalent to Corollary 3.8 of ICR11II ). Let n be sufficiently large, and let b > and £ > be 

constants. Then there exists 8 > such that for all A,B C { — 1, 1}" with min{|A|, \B\} > 2 n ~ Sn , we have 

v h (AxB)>(l-e)ii(AxB), (14) 

where Vb = j(^-2b/^n + ^ibj^Ti) an d ^ the distribution of (x,y) € { — 1, 1}" x { — 1,1}" where we pick 
X Er{— 1,1}" and choose y by flipping each coordinate of x independently with probability (1 — p)/2. 
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The above is also an anti-concentration statement about inner products in a large rectangle. One might 
therefore hope to use it to prove Lemma l4~T1 by showing that one kind of anti-concentration implies the other 
for "counting" reasons. That is, one might hope that every large set S C { — 1,1}" x { — 1,1}" that satisfies 
an inequality like (fl4l) also satisfies one like (U~3"l) . 

But this is not the case. Consider the set S = Sq U S22, where Sq is any subset of 2 2n ~ Sn inputs such that 
for all (x,y) G Sq we have (x,y) = 0, and Sy, is any subset of (e/2)|5o| inputs such that for all (x,y) G S#, 
we have (x,y) = Aby/n. Then, by construction, we have Pr( v . v ) GR s [(x,y) ^ [— &\/n, £1/"]] < £/2 < £, so 5 
does not satisfy an inequality like ( fT3T ). However, for several choices of e and Z?, it does satisfy the analogue 
of inequality (dU): a short calculation shows that V&(S) > j^fc/v^^ 46 ) — 3 £e5/, V( 5 ) > A t ( 5 ')- 

Thus, even given Fact l4.3[ we still need to use the rectangularity of S to prove Lemma |4~T1 It is this need 
to use rectangularity carefully that leads to the longish technical proof to follow, in Section [5] 

4.2 The Gap-Hamming Problem 

We now address the issue of proving a strong lower bound on IC^GHD). As before, we first note why 
existing methods do not imply an £2(n) lower bound, and then give our approach. We stress that our approach 
is, at this point, a program only and stops short of settling Conjecture 11.51 i.e., proving that IC^(ghd) = 

a(n). 

Previous Approaches. The orthogonality problem ORT is intimately related to the Gap-Hamming Dis- 
tance problem GHD. This was first noted by Sherstov, who used an ingenious technique to prove that 
R(GHD„) = O(n) based on his lower bound R(ORT 1 / 4 „) = O(n). He gave a reduction from ORT to GHD 
wherein a protocol for GHD was called twice to obtain a protocol for ORT. But this style of reduction does 
not yield a relation between information complexities, and so the lower bound on (ort) in Theorem 14. 2 1 
does not translate into a lower bound on IC m (ghd). 

The Chakrabarti-Regev proof [CR11| of the same bound R(ghd„) = O(n) introduces a technique that 
they call corruption-with-jokers which in turn is subsumed by what Jain and Klauck [ JK10] have called the 
"smooth rectangle bound." In fact, Jain and Klauck define two variants of the smooth rectangle bound: a 
linear-programming-based variant that they denote srec, and a "natural" variant that they denote srec. It is 
the former variant that subsumes the Chakrabarti-Regev technique, whereas our work here corresponds to 
the latter variant. 

Jain and Klauck do give a pair of translation lemmas, showing that the two variants are asymptotically 
equivalent up to some changes in parameters. Therefore, the Chakrabarti-Regev approach does yield a lower 
bound on scb^(GHD„), but the distribution X that comes out of applying the appropriate translation lemma 
is non-rectangular. Therefore, we cannot apply Theorem 13.11 

Furthermore, even granting Conjecture II. 1 K as claimed by Kerenidis et al. llKLL + 12lD . this line of rea- 
soning will only lower bound IC' 1 (ghd) for an artificial distribution X, and will not lower bound IC^ (ghd). 

Our Approach. Our idea is that, for large b, the function GHD„ is at least as "hard" as a function that 
is "close" to ORTfo ,,, under a uniform input distribution. To be precise, we have the following connection 
between GHD and ORT. Recall that \x n is the uniform distribution on {— 1, 1}" x {— 1, 1}". 

Theorem 4.4 (Precise restatement of Theorem ll.4l >. Let n be sufficiently large, let b > 100 be a constant, 
and let tail (1.9%) < 6 < 1/1600. Letri =n+ \(l.99b- l)y/n. Then, we have 

scb 40oVe( GHD «) = ^( cb Iooe(° RT ^')) " °W") ■ 
Combining this with Theorem \3.1\ we then have ICg"(GHD„) = n(cb 4 ^g(ORT^y)) — 0{yjn). 
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Remark. Suppose we could strengthen Theorem I4.2l by changing the constant 2.01 in that theorem to 1.98, 
i.e., suppose we had cbg'^ORT^) = £l(n) with e = tail(1.98Z?). Then the present theorem would give us 
IC e/40o( GHD ") = ^( n )> since £/400 > tail(1.99£) for large enough b. 

Proof. Put t = ri - n = ±{\.99b - l)^n. Consider the padding (x, y) G { — 1,1}" i — > (x',/) G {-1,1}"' 
defined by x' = (1, 1, . . . , l,x) and / = (— 1, — 1, . . . , — l,y). Then we have (x' ,y') = (x,y) —t. Since b > 100, 
for b' := 1. 99b, we have 

(x,y) G [-y/n,b'y/n\ (x ,y') G [-by/rJ,by/r?\ . (15) 
Let h : {— 1, 1}" x {—1,1}" — > { — 1, 1} be the partial function defined as follows: 



h(x,y) 



GHD„(*,y), if (x,y) < b' y/E , 
- GHD„ (x, y) , if (x, y) > b' . 



From (fT5l l and the definition of ORT we can conclude that ORT/,y(x' ,/) / 1 =^ h(x,y) ^ {1,*} for all 
x, y G { — 1, 1}"- Thus, for any rectangle R C {— 1, 1}" x { — 1,1}", we have 

\(x,y)€R:h(x,y)^{\,*}\ > |(x',y') G /?' : ORT iy (x' ,y') ± 1| 



|/?| ~ |/?'| 

where 7?' C {—1, 1}"' x { — 1,1}"' is the rectangle obtained by padding each (x,y) G R as above. Therefore, 
if /? is £-error 1 -monochromatic for h under \i n , then /?' is £-error 1 -monochromatic for ORT^„/ under \i n <. 

Hence, e-mono 1 ^" (h) < 2 2; £-mono 1 '^»' (ORT b>n > ) and thus, cbg M "(/i) > cbe' M "' (ORT fo , ;i /) - It. 
By standard estimates of the tail of a binomial distribution [Fel68], we have 

Pr \h(X,Y)^GHD n (X,Y)]= Pr [(X,Y) > b'^ < tail(fc') =tail(1.99fe) . (16) 

Therefore, scbg£"(GHD„) > cb l E ^"(h) > cbe Mn '(ORT fc! „/) -2t with 6 > tail(1.996)- The proof is now com- 
pleted by applying Theorem[3j] for the setting £ = 4000, we have < 3840 <£< 1/4 and ^(GHD^l)) > 
3/20. Therefore, we can conclude 

IC£(GHD„) =ft(scbJj(GHD„)) -0(1) =n(cbe A '(ORT fo y))-0( v / ^). □ 

5 Proof of the Anti- Concentration Lemma 

Finally, we turn to the most technical part of this work: a proof of our new anti-concentration lemma, stated 
as Lemma |4~T1 earlier. 

5.1 Preparatory Work and Proof Overview 

Let us begin with some convenient notation. We denote the (density function of the) standard normal 
distribution on the real line M by y. We also denote the standard 7i-dimensional Gaussian distribution by 
y". For a set A C R", we denote by y"\A the distribution y 1 conditioned on belonging to A. For a distribution 
P on W, we define its "distance to Gaussianity", denoted D r (P) as follows. 



D 7 (P)=D(P\\f):= fp(x)ln^dx. 
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The latter quantity is the well-known relative entropy for continuous probability distributions, and is the 
analogue of Dkl, which we have used earlier. Note that the logarithm here is to the base e, and not 2 as it 
was earlier. 

Let X,Y be possibly correlated random variables, with density functions Px and Py respectively. Let 
Px\Y=y denote the conditional probability density function of X given the value y of Y. We will sometimes 
write Dy(X) as shorthand for D r (P x ), and we will define 

D r (X\Y) = E y [D(P xlY=y \\ r )}. 

For a vector x£l" and a linear subspace VCR", we denote the orthogonal projection of x onto V by 
proj yjc. We denote the Euclidean norm of x by ||x||. 



The Setup. For a contradiction, we begin by assuming the negation of Lemma |4~T1 That is, we assume 
that there is a constant b>66 such that for all constants 8 > 0, there exist A,BC {—1,1}" such that 

min{|A|,|5|} >2"~ 5 " and (17) 
Pr \(X,Y) 4 \-by/n,by/n^ < e :=tail(2.01fe) . (18) 

{X,Y)e R AxB L J 

We treat the sets A and B asymmetrically in the proof. Using the largeness of A, and appealing to a 
concentration inequality of Talagrand, we identify a subset V C A consisting of &(n) vectors such that 

(PI) the vectors in V are, in some sense, near-orthogonal; and 

(P2) the quantity (x,Y), where y G# B, is concentrated around zero for each x G V, in the sense of (fT8T ). 

This step is a simple generalization of the first part of Sherstov's argument in his proof that R(GHD„) = Cl(n). 

As for the set B, we consider its Gaussian analogue B := {y G W : sign(j) G B}. Consider the random 
variable Q x = (x,Y) /y/n, for an arbitrary x G V and Y ~ Y'\g- On the one hand, we can show that prop- 
erty (P2) above implies "concentration" for Q x in some sense. Combined with property (PI), we have that 
projections of the set B along O(n) near-orthogonal directions are all "concentrated." On the other hand, 
arguing along the lines of Chakrabarti-Regev, we cannot have too much concentration along so many near- 
orthogonal directions, because B is a "large" subset of W. The incompatibility of these two behaviors of Q x 
gives us our desired contradiction. 

It remains to identify a suitable notion of "concentration" that lets us carry out the above program. The 
notion we choose is the escape probability p* = Pr[\Q x \ > (c + a)b], for suitable constants c, a > that we 
shall determine later. 



5.2 The Actual Proof 

Let Y denote a uniformly distributed vector in B. Define the set 

C:={xGA: Pr [(x,Y) £ [-b^n,b^R[] < 2e}. (19) 

YErB 

By Eq. (fT8l) and Markov's inequality, we have \C\ > ^\A\ > 2"~ Sn ~ l . We now use some geometry. 

Fact 5.1 (Generalization of I S he Hal Lemma 3.1]). Let 8 > be a sufficiently small constant and let n be 
large enough. Putk= \y/8n\. Suppose C C {—1,1}" has size \C\ >2 n ~ Sn ~ 1 . Then there exist x±, ... ,Xk GA 
such that 

VjG {!,...,£}, wehave \\ WOh P m{x 1 ,x 2 ,...,x J - 1 }Xj\\ <28 l/A ^n, (20) 
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Proof. Having chosen x\ , ■ . ■ ,xj-i (where j < k), we apply the appropriate variant of Talagrand's concentra- 
tion inequality HAS001 Theorem 7.6.1] to obtain that || P r °j S pan{x-j x-.i}^;'!! * s sharply concentrated around 
\J dim span {x 1; . . . ,xy-i} < \fk. In particular, there is an absolute constant c such that 



Pr 

XjE R {-l,l}" 



Iprojspanix!,...,^!}^!! > 2<5 1/4 V« 



<^ 2^ c\/Sn 



On the other hand Pr^r j u* [x G A] > 2 Sn l , which is larger than the above estimate if 8 is sufficiently 
small. Therefore, we can pick a suitable xj to continue. □ 

From now on, fix the "near-orthogonal" set of vectors xi,...,Xk, with k = \y/Sn\, given by Fact 15. II 
Recall that B := {y G MP : sign(y) G B}. We define a random variable Y correlated with Y as follows. Let 
(Fi , . . . , Y n ) be the coordinates of Y ; then define Yj = Yj \Wj\, where Wj ~ y and put Y = (Y\ , . . . , F„) . Notice 
that the resulting distribution of Y is exactly y"\g. We now define the random variable Qj and its escape 
probability p* as follows. 

Qj- Pj:=Pr[\Qj\>(c + a)b]. 

We shall eventually fix a particular index j and choose suitable constants c and a above. As mentioned in 
the overview, the proof will hinge on a careful analysis of this escape probability. 

Lower Bounding the Escape Probability 

We begin the study by showing that there exists an index j G {1 , . . . , k} such that Qj behaves quite similarly 
to a mixture of shifted standard normal variables (i.e., variances close to 1, but arbitrary means). This will 
in turn yield a lower bound on the corresponding p* . 

Let Jci, . . . ,Jcfc be the (truly) orthogonal vectors obtained from x\,...,Xk by the Gram-Schmidt process, 
i.e., xi := xi — proj spm r Xl ^.tX,. For i G {1, ...,&}, put x* = jc;/||jc,||, and let x* k+l ,x* be a completion 
of these vectors to an orthonormal basis of W. Expressing Y in this basis, and noting that (xj,x*) = for 
all i > j in step (|2TT) below, we derive 

Qj=^^if, x *)x^ 

= £i^S> <?,**) (21) 



V" £i V" 



where we define 



= rjZj + Sj, (22) 

The Pythagorean theorem says that (xj,x*j) 2 = \\xj\\ 2 — Uproj"^^^ x.^JC/H 2 . Recalling that \\xj\\ = \/n 
and using (|20T >. we conclude that 



Vj€{l,...,jt}, we have l-4v / 5<r 7 <l. (23) 
Lemma 5.2. There exists j G {!,...,&} smc/j f/zaf Dy(Zy | Sy) < v5. 
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Proof. Since > 2 n - Sn , we have f(B) >2- Sn . By definition, wehaveD y (y B |j) = -\nf(B) < (ln2)5n < 
8n. On the other hand, by the chain rule for relative entropy, we have 

IWIb) = D r (?) = D r (<?,*?>,. ■ ■ , <?,*:>) = t D 7 (<?,x*) | (?,Xi),. . . , <?,**_!)) • 

7=1 

Recalling that k = \y/8n\ , we deduce that there exists an index j G {1, . . . ,k} such that 

D r (j?,x$\{Y,xt),...,{Y,x*j_ 1 ) y )<y/8. (24) 

Since Sj is a function of (Y,x\), . . . , (Y ,x*j_ { ), we conclude that D r (Zy | Sj) < Vo. □ 

For the rest of our proof, we fix an index j as guaranteed by Lemma I5T21 We put r = r ; ,Z = Zj,S = Sj, 
Q = Qj, and p* = p*-. Now define the set 

y = {j £ R : D r (Z | 5 = s) < 5 1/4 } , 

so that Pr[5 $ y] < 5 1 / 4 by Markov's inequality. Clearly, either Pr[S > | S G J^] > \ or Pr[S < | S G 
J^] > i. In what follows, we shall assume that the former condition holds; it will soon be clear that this 
does not lose generality. Under this assumption we have 

D r (Z|5>0A5e^)<25 1/4 . (25) 

Therefore, by Pinsker's inequality HCT061 . the statistical distance between the distribution y and the distri- 
bution of (Z | S > A S G y) is at most ^2{28 l l A ) = 25 1 / 8 . Using this fact below, we get 

P* >Pr[<2> (c + a)b] 

= Pr[rZ + S > {c + a)b \ S > AS G y\ PrfS > | S G y\ -Pr[5 G y\ 

> 5(1 - 5 1 / 4 )Pr[rZ + 5 > (c + a)b \S>0AS £ y] 

> 5(1 - 5 1 / 4 )Pr[Z > (c + a)b/r\ S>0AS<Ey] 

> i(l - 5 1 / 4 ) (tail((c + a)fe/r) - 25 1 / 8 ) 

^(-(^)— )■ 

where the final step uses the lower bound on r given by (l23l) . 
Upper Bounding the Escape Probability 

Recall that we had fixed a specific index j after the proof of Lemma IBT21 and that Q = Qj = (xj,Y)/^. 
We shall now explore the relation between (xj,Y) and (xj,Y) to upper bound the escape probability. At this 
point it would help to review the discussion of the relation between Y and Y at the beginning of Section I5T21 

For simplicity, we put x := xj and assume, w.l.o.g., that x = (1,1,... ,1) so that (x,y) = LLi^'- This is 
legitimate because, if x,- = — 1 , we can flip x,- to 1 and y, to — y ; - without changing (x,y). 

Recall that each coordinate Yj of Y has the same distribution as where the variables {W,-} are 

independent and each W, ~ 7. Define T := Y!l=\ Y\l\fn\ note that T is a discrete random variable. After 
some reordering of coordinates, we can rewrite 

V^Q = (*,?) = I + \W 2 \ + ■ ■ ■ + \Wn + T^\) ~ (\Wn + T£ +1 {+■" + |W„|) • 
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Each \ Wj\ has a so-called half normal distribution. This is a well-studied distribution: in particular, for each 
i, we know that 

E[\W i \)=\[l, Vkr[|Wi|] • 2 



1 



K 



Thus, for each value t in the range of T, we have ¥.[y/nQ \ T = t] =ty/2n/n by linearity of expectation, and 
Nm[yfnQ \ T = t] = (1 — 2j%)n by the independence of the variables {Wj}. The half-normal distribution is 
well-behaved enough for us to apply Lindeberg's version of the central limit theorem [Fel68 |: doing so tells 
us that as n grows, the distribution of 

^h-Q^-nVn-Q {t) ] = Q {t) -t^/2/^ 

converges to y, where QW = (g | T = t). In other words, the disuibution of converges to the (shifted and 
scaled) normal distribution ,jV(t^j2j%, 1 — 2j%). Therefore, the distribution of Q converges to a mixture of 
such distributions. Fix the constants 



c:=y/2/K; a := y/l-2/n. 

Then the distribution of Q converges to that of V + cT, where V ~ ^(0, a 2 ) is independent of T. Using the 
convergence, we can easily prove the following claim. 

Claim 5.3. For sufficiently large n, we have p* =Pr[|g| > (c + a)b\ <2Pr[|V + cr| > (c + a)b\. □ 
Recalling that x G C, and using (fl9l) . we have Pr > b] < 2e. This lets us upper bound p* as follows. 



P 



T\ < 



V + cT\ > (c + a)b 

<Pr[|y| > ab] +Pr[|r| > b\ 
<2Pr[V/a > (a/o)b] + 2e 
= 2tail((a/a)Z7) + 2tail(2.01&) 



where in the last step we use the definition of e as given in (fT8l ). 



+ Pr \T\ > 



(27) 



Completing the Proof 

To complete the proof of the anti-concentration lemma, we combine the lower bound (l26l ) with the upper 
bound (1271 to obtain 



1 -* 1/4 W (£±f0* V M ., 



1-4^5/ 



<4tail ( ^ j +4tail(2.01^). 



Recall that we had started by assuming the negation of Lemma |4~TI in Eqs. (fT71) and (fT8T ). Thus, the above 
inequality is supposed to hold for some constant b>66 and all constants 8 > 0. However, if set a = 2.01a, 
we can get a contradiction: as 8 — > 0, the left-hand side approaches ^ tail((c + 2.0\o)b), whereas the right- 
hand side is 8tail(2.01&). Plugging in the values of c and a, we note that c + 2.01a < 2.01. Therefore, if 
we choose 8 small enough, we have a contradiction. 
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